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ABSTRACT 


This  thesis  introduces  exploratory  data  analysis  methods 
into  the  question  of  categorizing  pilots  and  relating  these 
categories  to  accident  potential.   The  usually  recorded 
flight  data  deals  with  the  pilots'  total  flight  experience, 
recency,  and  frequency  of  flying.   The  purpose  of  categorizing 
is  to  determine  if  the  recorded  flight  data  could  help  dis- 
criminate between  two  original  sample  groups  of  fifty  pilots 
each,  those  pilots  with  accidents  during  FY73  and  those 
without. 

The  technique  of  linear  discriminant  analysis  indicated 
that  there  is  a  significant  difference  in  the  mean  vectors 
of  flight  data  for  the  two  groups.   The  computed  discriminant 
function  produced  an  empirical  correct  classification  rate 
of  8l$.   Techniques  of  cluster  analysis  (with  the  aid  of 
principal  components  analysis)  are  also  employed  to  detect 
patterns  or  differences  in  the  data.   Curiously,  the  amount 
of  time  flown  in  the  last  48  hours  is  associated  with  rela- 
tively low  accident  potential,  whereas  time  flown  in  the 
last  24  hours  seems  to  be  correlated  with  a  higher  accident 
potential. 
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I.   INTRODUCTION  AND  OBJECTIVES 

Aviation  safety  in  the  United  States  Navy  has  always 
received  considerable  attention.   With  the  rapidly  increas- 
ing costs  of  naval  aircraft  and  the  increasing  costs  of 
training  naval  aviators,  it  is  imperative  that  every  possi- 
ble aspect  of  aviation  safety  be  thoroughly  investigated. 
It  is  important  to  search  all  paths  which  may  yield  any 
information  at  all  having  a  bearing  on  aircraft  accident 
causation  or  prevention. 

Over  the  last  five  years,  approximately  fifty  per  cent 
of  the  major  and  minor  aircraft  accidents  in  the  Navy  have 
included  pilot  error  as  either  the  primary  factor  involved 
or  as  a  contributing  factor  to  the  cause  of  the  accident. 

There  have  been  many  reasons  purported  as  to  the  causes 
of  pilot  error  accidents,  ranging  anywhere  from  plain  lack 
of  physical  coordination  to  mental  incompetence.   A  general 
term  which  relates  to  both  physical  and  mental  abilities  is 
experience.   That  is,  as  flying  experience  increases,  the 
learning  process  should  increase  both  of  these  abilities. 
Another  general  term  which  affects  these  two  abilities  is 
proficiency.   That  Is,  recency  and  frequency  of  flying 
should  also  have  a  direct  bearing  on  these  abilities. 

This  thesis  explores  methods  for  classifying  or  cate- 
gorizing pilots  according  to  variables  associated  with  their 
experience  and  proficiency.   Accident  records  are  used  to 


determine  if  there  is  any  relation  between  the  classifi- 
cations and  the  occurrence  of  accidents. 

One  would  like  to  know  if  by  investigating  a  pilot's 
experience  and  proficiency  data  whether  or  not  he  shows  a 
high  or  low  accident  potential.   Of  specific  interest  is  the 
question  of  whether  pilot  error  accidents  are  related  to 
lack  of  total  flying  experience ,  lack  of  experience  in  type 
of  aircraft,  or  lack  of  practice  due  to  insufficient  current 
flying.   If  an  individual  were  classified  as  having  a  high 
degree  of  accident  potential,  then  corrective  action  could 
be  taken  to  reduce  this  potential. 

Only  the  pilots  of  Navy  fixed-wing  aircraft  are  studied. 
Marine  and/or  helicopter  pilots  are  not  included.   The  study 
encompasses  those  accidents  that  occurred  during  fiscal  year 
1973.   Unfortunately,  the  data  base  contains  the  records  of 
only  fifty  aviators  who  have  been  involved  in  pilot  error 
accidents.   Fifty  other  pilots  were  selected  as  a  control. 
Even  with  these  small  numbers,  a  result  appeared  that  may  be 
worth  pursuing  further.   Recency  of  flying  may  be  overdone. 
The  amount  of  time  flown  in  the  last  48  hours  is  positively 
correlated  with  low  accident  potential,  but  a  reversal  seems 
to  take  place  when  looking  at  the  time  flown  in  the  last 
2  4  hours. 


II.   FACTORS  INFLUENCING  EXPERIENCE  AND  PROFICIENCY 

There  are  many  factors  affecting  experience  and  profi- 
ciency.  Situations  encountered,  crises  faced,  types  of 
missions  flown,  and  many  other  qualitative  factors  have  a 
definite  bearing.   However,  the  only  factors  considered  here 
are  quantitative  variables  which  can  be  obtained  from  acci- 
dent records  and  IFARS  (Individual  Flight  Activity  Reporting 
System)  pilot  records.- 

The  Naval  Safety  Center  at  Norfolk,  Virginia  maintains 

records  of  all  accidents  in  which  Naval  aircraft  are  involved. 

The  recorded  data  items  which  reflect  a  pilot's  total 

experience  are  the  following: 

Number  of  years  designated  a  naval  aviator 

Total  flying  hours 

Total  flying  hours  in  the  model  aircraft  in  which 
the  accident  occurred 

Total  day  carrier  landings 

Total  night  carrier  landings 

The  data  items  which  reflect  his  proficiency  (i.e.  his 

recency  and  frequency  of  flying)  are  the  following: 

Time  all  series  this  aircraft  in  last  90  days 
Time  this  model  this  aircraft  in  last  90  days 
Elapsed  time  since  last  previous  flight 
Time  flown  in  the  last  24  hours 
Time  flown  in  the  last  4  8  hours 
Number  of  missions  flown  in  the  last  24  hours 
Number  of  missions  flown  in  the  last  48  hours 
Number  day  carrier  landings  in  last  30  days 


Number  night  carrier  landings  in  last  30  days 
Instrument  trainer  time  in  last  90  days 
Weapons  system  trainer  time  in  last  90  days 

The  Individual  Flight  Activity  Reporting  System  (IFARS), 

a  part  of  the  Naval  Safety  Center,  maintains  flight  records 

on  all  naval  aviators  by  fiscal  year.   The  only  data  items 

pertaining  to  pilot  experience  which  are  retrievable  from 

computer  access  for  all  fiscal  years  are: 

Number  of  years  designated  a  naval  aviator 
Total  flying  hours 

At  present,  these  following  additional  experience  items  are 

retrievable  by  computer  only  from  the  beginning  of  fiscal 

year  1969  an(i  thus  cannot  be  used  as  comparison  variables 

since  many  of  the  aviators  in  both  sample  groups  began 

flying  prior  to  1969. 

Total  time  by  model 

Day  and  night  carrier  landings  by  model 
Other  type  landings  by  model 
Instrument  time  by  model 

A  new  compilation  is  now  in  progress  by  the  IFARS  sec- 
tion at  the  Naval  Safety  Center  to  record  all  flights  on 
computer  files  for  all  fiscal  years  for  all  pilots  so  that 
future  studies  can  be  more  encompassing. 

The  proficiency  indicator  data  items  for  those  pilots 
in  the  accident  group  have  a  natural  base  point  from  which 
to  be  measured.   That  is,  an  Item  such  as  "time  flown  in  the 
last  48  hours"  means  the  last  48  hours  directly  prior  to  the 
accident  in  which  the  pilot  was  Involved.   However,  for 


the  non-accident  (control)  group,  there  is  no  such  reference 

point  from  which  to  measure.   Thus,  comparison  of  proficiency 

data  items  becomes  rather  nebulous. 

One  reasonable  way  to  give  significant  meaning  to  the 

term  proficiency  is  to  artificially  construct  similar  data 

items  by  an  averaging  procedure.   For  example,  prior  to  each 

flight  (for  the  period  in  question)  compute  the  time  flown 

in  the  preceding  48  hours.   Do  this  for  every  flight  during 

the  fiscal  year  and  then  obtain  an  average  time  flown  in 

the  preceding  4  8  hours.   The  necessary  data  can  be  obtained 

from  a  detailed  flight  listing  for  the  pilots  in  the  control 

group  for  FY73.   This  procedure  can  be  utilized  for  the 

following  data  items: 

Time  all  series  this  aircraft  last  90  days 
Elapsed  time  since  last  previous  flight 
Time  flown  in  the  last  24  hours 
Time  flown  in  the  last  48  hours 
Number  of  missions  flown  in  the  last  24  hours 
Number  of  missions  flown  in  the  last  48  hours 
Number  day  carrier  landings  in  last  30  days 
Number  night  carrier  landings  in  last  30  days 

With  these  artificially "constructed  data  items  one  can  in- 
clude proficiency  in  the  comparison  between  the  control 
group  and  the  accident  group.   The  appropriateness  of  doing 
this  can  be  determined  by  comparing  the  results  of  statisti- 
cal analyses  performed  with  and  without  these  added  variables. 
If  these  added  variables  give  a  better  delineation  between 
groups,  then  it  is  appropriate  to  include  them. 
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To  recap,  the  variables  which  are  common  to  both  groups 

and  which  are  used  for  the  analysis  are : 

Number  of  years  designated  a  naval  aviator 

Total  flying  hours 

Time  all  series  this  aircraft  last  90  days 

Time  since  last  previous  flight 

Time  flown  in  the  last  2  4  hours 

Time  flown  in  the  last  48  hours 

Number  of  missions  flown  in  the  last  24  hours 

Number  of  missions  flown  in  the  last  4  8  hours 

Number  of  day  carrier  landings  in  last  30  days 

Number  of  night  carrier  landings  in  last  30  days 


(x1) 
(x2) 

(x3) 
Cx4) 

(x5) 
(x6) 

(x?) 

(x8) 

(x9) 
(x10) 
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III.   SELECTION  OF  GROUPS 

The  accident  group  was  composed  of  all  those  pilots 
who  were  Involved  in  pilot  error  accidents  during  fiscal 
year  1973.   This  group  comprised  66  different  pilots;  no 
pilot  had  more  than  one  accident  attributable  to  pilot  error. 
Due  to  incomplete  data  in  two  cases,  this  was  reduced  to 
Sk   pilots. 

The  control  group  was  more  difficult  to  establish  since 
there  were  several  thousand  aviators  from  which  to  choose. 
A  subset  of  these  pilots  was  obtained  that  satisfied  two 
criteria:   (1)  it  appeared  to  be  a  sample  representative  of 
all  naval  aviators,  and  (2)  the  data  was  relatively  easy  to 
obtain.   The  sample  taken  was  the  first  100  aviators  on  the 
IPARS  files.   Since  the  IFARS  files  are  ordered  by  increasing 
social  security  number  and  the  increments  between  successive 
numbers  was  very  large,  examination  of  the  biographical  data 
leads  us  to  believe  that  social  security  numbers  had  no 
bearing  upon  age,  length  of  time  in  aviation  duties,  or  even 
length  of  time  in  the  Naval  Service.   There  was  no  obvious 
reason  to  think  that  the  sample  was  unrepresentative. 

From  the  100  pilots  initially  assigned  to  the  control 
group,  20  were  helicopter  pilots  and  15  were  Naval  Flight 
Officers,  thus  leaving  65  subjects  in  the  control  group. 
Since  the  size  of  the  two  groups  under  study  is  arbitrary, 
a  further  reduction  in  the  size  of  each  group  was  made  to 


12 


meet  a  computational  constraint  which  was  imposed  by  a 
computer  program  employed  in  the  actual  analysis.   Because 
of  the  extensive  computational  effort  required  in  the 
analytical  techniques  used,  the  use  of  a  digital  computer 
was  mandatory.   One  of  the  computer  programs  used  for  the 
analysis  had  a  limitation  of  100  data  units.   Therefore,  a 
random  selection  of  50  subjects  was  chosen  for  each  of  the 
two  groups  under  study.   (The  random  selection  was  accomplished 
in  the  manner  of  drawing  numbers  out  of  a  hat.) 
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IV.   INVESTIGATIVE  APPROACHES 

The  data  describing  the  subjects  is  composed  of  ten 
pieces  of  information  for  each  subject.   This  constitutes 
a  multivariate  data  set.   Therefore,  some  sort  of  multi- 
variate statistical  technique  is  appropriate.   Which  sta- 
tistical techniques  to  employ  depends  upon  the  information 
desired  to  be  obtained  from  the  analysis,  and  is  the  primary 
concern  of  this  section. 

As  stated  in  the  introduction,  one  of  the  primary  objec- 
tives is  to  establish  a  classification  scheme  and  then  to 
determine  if  this  classification  is  related  to  the  occurrence 
cf  accidents.   One  statistical  procedure  which  treats  this 
problem  is  that  of  discriminant  analysis.   Discriminant 
analysis  is  a  multivariate  statistical  technique  used  for 
constructing  decision  rules  by  which  data  units  (subjects, 
or  pilots  in  the  present  context)  can  be  classified  as 
members  of  one  group  or  another.    The  goal  is  to  assign 
subjects  to  the  groups  to  which  they  have  the  greatest 
resemblance  based  upon  a  profile  of  their  characteristics, 
while  at  the  same  time  to  minimize  the  effects  of  misclassi- 
fication.2 


Anderberg,  M.R.,  Cluster  Analysis  for  Applications, 
p.  191,  Academic  Press,  Inc.,  1973 

p 

Eisenbels,  R.A.,  and  Avery,  R.B.,  Discriminant  Analysis 

and  Classification  Procedures,  p.  3,  Lexington  Books,  1972 
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The  procedure  constructs  a  discriminant  function  based 
upon  input  data  in  which  subjects  are  members  of  known  groups, 
This  discriminant  function  is  usually  linear  but  can  be  qua- 
dratic or  have  other  forms.   The  data  are  used  to  make  the 
function  specific  (determine  the  parameters).   Typically, 
it  is  then  used  to  reassign  the  original  subjects  to  one  of 
the  two  groups  on  the  basis  of  their  characteristics  in  order 
to  make  an  empirical  determination  of  the  rate  of  misclassi- 
fication.   If  all  subjects  are  reassigned  to  the  group  from 
which  they  initially  came,  then  there  is  zero  percentage 
misclassification  and  perfect  discrimination  between  groups. 
The  discriminant  function  can  also  be  used  to  categorize 
other  observations  (subjects),  whose  group  membership  is 
unknown,  on  the  basis  of  their  attributes. 

If  several  (more  than  two)  groups  are  present,  then  a 
set  of  discriminant  functions  is  constructed  to  assign 
observations  to  the  appropriate  groups. 

A  linear  discriminant  function  will  be  constructed  for 
the  two  pilot  groups  on  the  basis  of  their  experience  and 
proficiency  characteristics.   If  the  function  discriminates 
well,  then  one  can  determine  what  particular  characteristics 
have  the  strongest  influence  on  placing  a  subject  in  the 
accident  group.    Also,  by  applying  the  discriminant  function 
to  subjects  not  in  the  original  test  groups  one  can  determine 
their  accident  potential. 


3Press,  S.J.,  Applied  Multivariate  Analysis,  p.  376-379, 
Holt,  Rinehart  and  Winston,  Inc.,  i972 
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The  assumptions  upon  which  discriminant  analysis  is 
based  and  the  actual  mathematics  will  be  covered  in  the 
next  section. 

If  the  discriminant  function  fails  to  separate  the  groups 
without  a  high  rate  of  misclassification,  the  lack  of  success 
can  be  attributed  to  one  of  two  causes.   The  first  is  that 
the  variables  characterizing  the  subjects  do  not  distinguish 
between  the  groups  to  a  strong  enough  degree  or  the  groups 
overlap  too  much  in  the  given  measurement  space.   The  second 
is  that  the  groups  cannot  be  separated  by  a  function  of  the 
form  chosen  for  the  analysis.   That  is,  maybe  instead  of  a 
linear  discriminant  function  we  should  have  a  quadratic  or 
more  complex  one. 

To  illustrate  the  preceding  concept,  let  the  accident 
group  be  denoted  by  "A"  and  the  control  group  by  "C".   Now, 
if  one  considers  the  groups  in  two  dimensions  only  (instead 
of  the  actual  ten)  the  groups  might  be  clumped  as  in  Figure 
(1). 


Figure  (1) 
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In  this  case  a  linear  discriminant  function  would  serve  to 
separate  the  groups  well  and  it  is  not  necessary  to  construct 
a  quadratic  function.   If,  however,  the  data  ppeared  as  in 
Figure  (2),  then  one  can  see  that  a  linear  discriminant 
function  cannot  discriminate  among  the  groups  without  error. 
However,  a  quadratic  form  of  discriminant  function  such  as 
the  curve  depicted  might  very  well  have  excellent  discrimi- 
nating capabilities. 


Figure  (2) 

The  linear  discriminant  function  is  a  tool  that  is 
immediately  available  in  terms  of  computer  programs.   It 
is  based  upon  the  assumption  that  the  data  came  from  a 
multivariate  normal  population,  and  when  this  assumption  is 
met,  it  works  as  well  as  any  other  discriminant  function. 
Other  discriminant  functions  are  not  readily  available  for 
use.   Also,  the  linear  discriminant  function  could  do  a  good 
job  even  If  the  multivariate  normal  assumption  is  not  met, 
i.e.  when  the  natural  separation  of  groups  Is  so  great  that 
even  a  simple  method  would  do  the  job. 
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For  the  problem  at  hand,  the  use  of  the  linear  discrimi- 
nant function  was  encouraging,  but  since  the  assumption  of 
multivariate  normality  is  not  appropriate  (e.g.  rotation 
policies  split  variables  X,  and  Xp  so  that  their  distribu- 
tions are  multimodal)  it  was  decided  to  explore  the  nature 
of  the  data  to  see  if  a  better  job  could  be  done. 

Exploratory  data  analysis  on  100  points  in  Euclidian 
10-space  is  not  easy.   Some  form  of  cluster  analysis  is 
called  for,  that  is,  cluster  the  subjects  into  groups. 
This  leads  to  the  question  of  how  many  groups  we  actually 
have  and  how  the  data  are  grouped. 

Cluster  analysis  is  actually  a  collection  of  techniques 
that  are  used  to  group  multidimensional  entities  according 

to  various  criteria  of  their  degrees  of  homogeneity  or 

5 
heterogeneity.    For  example,  in  this  problem  grouping  will 

be  on  the  basis  of  the  values  of  each  variable  which  des- 
cribes the  pilot's  flight  experience  and  proficiency.   Pilots 
with  high  total  flight  time  might  tend  to  cluster  into  one 
group  while  pilots  with  few  carrier  landings  or  with  little 
time  since  last  flight  might  tend  to  cluster  into  other 
groups.   How  close  should  the  values  of  the  variables  be 
before  subjects  are  grouped  into  the  same  cluster  is  the 
question  of  the  degree  of  homogeneity  desired,  and  how  many 
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clusters  there  should  be  Is  the  question  of  the  degree  of 
heterogeneity  desired.   This  type  of  grouping  is  called 
grouping  by  subjects;  that  is,  the  entities  are  subjects. 
The  entities  can  also  be  the  variables  themselves,  in  which 
case  the  clustering  is  said  to  be  by  attributes. 

There  are  several  pertinent  questions  to  bear  in  mind 
when  performing  a  cluster  analysis.   How  many  clusters  are 
inherent  in  the  data?   Since  attributes  may  be  measured  in 
different  units,  should  the  attributes  be  standardized 
before  they  are  clustered?  How  large  should  the  errors  be 
before  they  are  considered  intolerable?  There  will  be  one 
type  of  error  made  by  not  assigning  similar  entities  into 
the  same  group,  and  another  type  of  error  made  by  grouping 
dissimilar  entities  into  the  same  cluster.   Should  all 
possible  pairs  of  points  (or  attributes)  be  scrutinized  for 
similarities?   Not  all  of  these  questions  have  definite 
answers,  but  they  will  be  addressed  in  the  next  section. 

In  most  other  statistical  techniques,  such  as  analysis 
of  variance,  the  variables  usually  possess  some  structure 
of  belonging  to  particular  populations  a  priori.   Consequently, 
it  is  often  possible  to  assume  particular  distributions  for 
the  populations  and  make  associated  inferences.   In  clustering 
problems,  however,  the  principal  concern  is  how  to  establish 
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appropriate  populations.   Thus,  clustering  analysis  logically 

precedes  the  application  of  most  other  multivariate  proce- 

7 
dures  when  the  data  do  not  possess  structured  form. 

There  are  two  possible  approaches  to  clustering.   These 

are  enumerative  procedures  and  non-enumerative .   Enumerative 

means  simply  to  list  all  the  possible  groupings  of  subjects 

(attributes  if  the  clustering  is  by  this  form  of  entities). 

The  number  of  possible  groupings  is  represented  by  a  Stirling 

Number  of  the  Second  Kind.   For  example,  in  clustering 

twenty-five  subjects  into  five  groups  there  are  between  two 

and  three  quadrillion  possibilities  from  which  to  choose 

o 

the  best  grouping.    This  is  not  feasible  even  with  a  compu- 
ter, especially  when  the  problem  is  much  larger  than  this. 
Some  feasible  non-enumerative  techniques  are  described  in 
the  next  section. 

If  through  the  use  of  cluster  analysis  one  can  find  a 
feasible  set  of  groupings  that  have  meaning  to  this  problem 
then  the  groupings  can  be  analyzed  by  a  discriminant  analysis 
to  obtain  the  desired  classification  procedure. 

Clustering  by  variables  can  also  prove  to  be  worthwhile 
in  that  it  can  help  to  determine  if  some  of  the  variables 
are  redundant  and  not  providing  any  additional  information. 


7lbid. 
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If  so,  those  redundant  variables  can  be  eliminated  or  com- 
bined, thus  simplifying  the  required  computations.   A 
method  of  combining  variables  which  was  utilized  was  that 
of  principal  components  analysis. 

Multivariate  analysis  by  the  principal  components 
model  attempts  to  reduce  the  dimension  of  the  problem  while 
retaining  as  much  information  (i.e.  variation)  contained  in 
the  original  data  as  possible.   The  method  produces  linear 
combinations  of  the  original  variables  which  maximize  the 
variance  of  the  resultant  weighted  sum.   Thus  attention  is 
centered  primarily  on  the  variable  with  the  greater  varia- 
bility by  the  appropriate  assignment  of  the  weights.   This 
linear  combination  of  the  variables  is  called  the  first 
principal  component  and  reduces  our  set  of  old  variables  to 
one  variable.   If  it  is  desired  to  extract  more  variance 
from  the  data,  one  can  construct  a  second  principal  component 
which  is  orthogonal  to  the  first.   The  process  can  be  repeated 
until  there  are  as  many  components  as  original  variables, 

and  thus  have  extracted  one-hundred  percent  "of  the  total 

o 
variance . 

The  objective  of  principal  components  analysis  Is  not 

merely  to  reduce  the.  size  and  complexity  of  the  problem, 

but  also  to  glean  information  from  the  data  which  might  not 
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otherwise  be  obvious.   Specifically,  in  the  problem  under 
study  here,  the  fifth,  sixth,  seventh  and  eighth  variables 
listed  on  page  eleven  can  be  regarded  as  prime  indicators 
of  frequency  of  flying.   However,  when  the  data  is  analyzed 
(by  cluster  analysis)  the  exact  effects  of  these  variables 
might  not  readily  be  apparent.   When  all  these  variables 
are  combined  into  one  variable  (i.e.  the  first  principal 
component)  the  effect  of  frequency  might  be  quite  obvious. 
That  is,  it  might  be  observed  that  frequency  of  flying  has 
an  inverse  relationship  with  the  occurrence  of  accidents. 
For  this  analysis,  of  those  variables  listed  in  page 
eleven,  the  first  and  second  (years  designated  naval  aviator 
and  total  hours  flown)  were  combined  to  get  a  "total 
experience"  variable;  and  the  fifth,  sixth,  seventh  and 
eighth  (time  flown  in  the  last  24  hours,  time  flown  in  the 
last  48  hours,  number  missions  flown  in  the  last  24  hours, 
and  number  missions  flown  in  the  last  48  hours)  were  combined 
to  get  a  "frequency"  variable. 
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V.   ANALYTICAL  TECHNIQUES 

The  first  analytic  technique  applied  to  the  two  groups 
of  pilots  was  that  of  discriminant  analysis  with  the  primary 
objective  being  to  develop  an  accurate  linear  discriminant 
function.   (Actually,  the  purposes  of  discriminant  analysis 
are  first  to  determine  if  there  is  a  difference  among  popu- 
lation means  or  equivalently  if  there  are  any  overlaps  among 
the  groups,  and  secondly,  to  construct  classification  schemes 
based  upon  the  descriptive  variables.) 

There  are  three  basic  underlying  assumptions  of  dis- 
criminant analysis.   They  are  (1)  that  the  groups  being  inves- 
tigated are  discrete  and  identifiable,  (2)  that  each  observa- 
tion (subject)  in  each  group  can  be  described  by  a  set  of 
measurements  on  m  characteristics  or  variables,  and  (3)  that 
these  m  variables  are  assumed  to  have  a  multivariate  normal 
distribution  in  each  population  and  equal  covariance  matrices 
among  populations.   The  first  two  assumptions  are  seen  to  be 
satisfied  as  discussed  in  previous  sections.   The  third  assump- 
tion indicates  the  need  for  separate  statistical  tests  to 
determine  if  the  variables  are  multivariate  normal  and  if  the 
covariance  matrices  are  equal.   It  has  been  mentioned  that 
non-normal  multivariate  data  does  not  necessarily  bias  the 
results  of  a  discriminant  analysis.   Also,  since  no  satis- 
factory tests  exist  for  testing  populations  to  be  multivariate 
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normal,  it  is  difficult  to  routinely  test  the  normality 
assumption.   Finally,  the  central  limit  theorem  suggests 
that  as  the  number  of  observations  increases,  the  discri- 
minant values  for  each  group  approaches  a  normal  distribu- 
tion.10 

The  assumption  of  equality  of  covariance  matrices 
(i.e.  equality  of  within  group  dispersions)  appears  to  be 
more  critical  in  biasing  the  results.   Eisenbeis  and  Avery 
suggest  that  linear  classification  rules  are  not  adequate 
when  unequal  covariance  matrices  exist  and  that  quadratic 
classification  rules  should  be  employed. 

The  within  group  dispersion  matrices  for  the  two  groups 
of  data  were  computed  and  are  shown  in  Table  VI  in  Appendix 
C.   The  pooled  within-groups  dispersion  matrix  is  also 
shown.   The  group  dispersion  matrices  were  tested  for  equality 
by  the  procedure  given  in  Appendix  D. 

After  satisfying  the  assumptions  preparatory  to  the 
actual  analysis  one  can  first  test  the  equality  of  group 
means.   The  null  hypothesis  is: 


V  ^1  =  ^2 
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where 


yl   =    (yl,l'    yl,2>    ■•••    tilJ10) 


and 


Pn     "      (l 


';:  ^2,1'    U2,2S    *  *  *  »    y2,10)' 


The  following  steps  are  used  by  the  BIMED04M  computer 
program  to  test  for  the  equality  of  group  means: 
Step  (1)  —  the  means  for  each  group  are  computed 

X.,     k  X..   1  ,  X.  p  J   •••a  X .   n(V   s  * 

Step  (2)  —  the  differences  In  group  means  are  computed 
Xl  ~  X2  "  (xl,l  "  x2,l>  '">    xl,10  "  x2,10^ 


1       2 
Step  (3)  —  the  matrices  S   and  S   are  computed  where  an 

element  of  S   is  given  by 

1  ni 

Su,v=   jf^Uu  -    xi,u)(xijv  -    xi,v)        and 


i    =    1,2;         u   =    1,2,...,    10;         and        v   ■    1,2,...,    10 
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Step  (4)  —  the  matrix  A  is  computed 


1    2 
A  =  S1   +  S 


where  (aJ1,  aJ2,  ...,  aJjl°)  is  the  j  —  row  of  A 


2 

Step  (5)  —  the  Mahalanobis  D  statistic  is  computed 


2  m   m  ii  - 

TT   =  (nn  +  n?-2)   £   Z  a  J(xn  .  -  x0   ,  )(xn  .  -  x„  .) 

-L      <:-        i=l   1  =  1         -LJ-L        *-  J  -1-      J-jJ        ^jJ 


Step  (6)  —  the  F  statistic  is  computed 


nnn„(nn  -  nn  -  m  -  1)      ~ 

— r- — r-r- 7\  •  D   ~  Fn.  -  n0  -  m  -  1 

m(.n..  -  n?  n.  -  n„  -  2;  1    2 


where  nn  and  n,  are  the  respective  sizes  of  the  two 

12 
groups  and  m  is  the  number  of  variables. 


The  null  hypothesis  can  be  rejected  when  the  value  of  the 
test  statistic  is  greater  than  the  tabled  value  of  F  for 
the  desired  level  of  significance. 

The  construction  of  the  discriminant  function  is  predi- 
cated upon  minimizing  the  effects  of  misclassification  and 
assigning  subjects  to  the  group  to  which  they  have  the 
greatest  resemblance.   The  effects  of  misclassification 
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depend  upon  the  a  priori  knowledge  of  group  membership  and 
the  costs  or  penalties  of  misclassification .   The  BIMED 
programs  assume  no  special  a  priori  probabilities  of  group 
membership,  i.e.  the  probability  of  belonging  to  either 
group  (in  the  two-group  case)  is  one-half.   They  also  assume 
the  costs  of  misclassification  to  be  equal,  i.e.  the  cost 
of  assigning  an  actual  member  of  group  number  one  to  group 
number  two  is  the  same  as  assigning  a  member  of  group  number 
two  to  group  number  one . 

The  measure  of  resemblance  is  determined  by  the  m  char- 
acteristics which  describe  each  subject.   By  substituting 
the  values  of  the  characteristics  into  each  group's  proba- 
bility density  function  it  is  determined  how  closely  the 
subject  resembles  the  group  as  compared  with  the  rest  of  the 
population.   The  BIMED  programs  yield  the  coefficients  and 

constants  for  the  linear  discriminant  function  for  each 

11 
group  in.  the  total  population. 

In  order  to  determine  what  effect  the  chosen  variables 
had  on  proficiency  and  experience  it  was  desirable  to  mea- 
sure the  association  among  the  variables.   The  association 
measure  employed  was  the  product-moment  correlation  coeffi- 
cient.  The  correlation  computations  and  correlation  matrix 
for  the  entire  data  set  is  given  in  Appendix  F. 
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The  problem  of  how  to  group  the  variables  given  this 
association  measure  can  be  solved  through  the  use  of  hier- 
archical clustering  techniques.   These  techniques  can  also 
be  useful  to  cluster  by  data  units  (subjects)  which  have  a 
different  association  measure. 

For  the  association  measure  among  data  units,  most 
investigators  use  metric  measures  when  the  data  units  are 
described  by  interval  variables.   Metric  measures  must 
satisfy  certain  properties.   If  E  is  a  given  measurement 
space  and  X,  Y,  and  Z  are  points  in  E,  then  an  association 
function  D  is  a  metric  measure  if  and  only  if  it  satisfies 
the  following  conditions: 

(1)  D(X,Y)  =0  if  and  only  if  X  =  Y 

(2)  D(X,Y)  >  0  for  all  X  and  Y  in  E 

(3)  D(X,Y)  =  D(Y,X)  for  all  X  and  Y  in  E 

(4)  D(X,Y)  <  D(X,Z)  +  D(Y,Z)  for  all  X,  Y  and  Z  in  E 

The  most  common  metric  measure  is  the  Euclidian  distance 

n  2  h 

function,  D0(X.,  X,  )  =  [  Z  (x.  .  -  x.,  )  ]  2.   This  is  a  special 

case  of  the  general  class  of  metrics  called  Minkowski  metrics 

which  have  the  form  D  (X,  ,  X.  )  =  [  I  |  x .  .  -  x,.  |p]1/p  , 

P   J    K      i=l    " 

where  p  >_  1  and  X.  =  (x,  .   x9 .  ,  ...,  x  .)  is  the  vector 
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of  scores  on  the  j —  data  unit.   In  this  analysis  the 
Euclidian  distance  function  was  used  to  cluster  the  data 
units.15 

The  hierarchical  methods  are  used  to  construct  a  tree 
(dendrogram)  depicting  the  relationship  among  the  entities. 
The  entities  are  grouped  into  clusters  in  order  of  their 
association  measures  or  similarities.   The  ordering  provides 
a  hierarchy,  thus  the  name.   The  similarities  can  be  of  many 
forms  of  association  measures;  the  general  term  applied  to 
the  matrix  being  a  similarity  matrix. 

A  breakdown  of  hierarchical  methods  yields  agglomerative 
and  non-agglomerative  procedures.   The  agglomerative  proce- 
dures start  with  the  branches  (each  entity)  and  combine  these 
entities  until  there  is  but  one  remaining  cluster  (the  root). 
The  alternative  procedures  work  from  the  root  backward. 
Only  the  former  was  used  in  this  analysis. 

There  are  many  actual  techniques  and  criteria  of  hier- 
archical clustering.   Initially  each  entity  is  considered  to 
be  a  cluster  of  one.   The  first  method  searches  the  similarity 
matrix  for  the  pair  of  entities  with  the  highest  degree  of 
association  (e.g.  largest  correlation  among  the  variables) 
and  groups  these  two  entities.   It  then  searches  all  remain- 
ing clusters  and  groups  those  two  clusters  which  are  closest, 
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i.e.  the  correlation  among  their  closest  members  is  highest. 
This  step  is  repeated  until  there  is  but  one  cluster  remain- 
ing.  This  method  is  called  the  "single  linkage"  method  by 
Anderberg  or  the  "connectedness"  method  by  Johnson.    The 
names  derive  from  the  fact  that  each  cluster  is  joined  by 
the  single  shortest  or  strongest  link  (thus  most  strongly 
connected)  between  them. 

The  second  procedure,  called  complete  linkage,  is  the 
same  as  single  linkage  except  that  the  association  between 
groups  is  the  association  between  their  farthest  members. 
Johnson  calls  this  the  diameter  method  because  all  entities 
in  a  cluster  are  linked  to  each  other  at  some  maximum  distance 
(or  diameter) . 

Hierarchical  clustering  is  usually  not  too  enlightening 
for  the  clustering  of  data  units.   The  non-hierarchical 
methods  are  more  appropriate  for  classifying  the  data  units 
into  a  single  classification  of  k  clusters.   The  basic  con- 
cept in  most  of  the  non-hierarchical  methods  is  to  begin 
with  an  initial  partition  of  the  data  units  and  adjust  the 
cluster  members  to  obtain  a  "best"  partition. 

The  simplest  and  most  common  non-hierarchical  clustering 
procedure  is  that  of  centroid  sorting.   Beginning  with  the 
initial  partition  of  k  clusters  (each  usually  consisting  of 
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one  data  unit)  a  new  data  unit  is  assigned  to  the  cluster 
with  the  nearest  centroid  by  some  sort  of  distance  measure. 
Centroids  are  recomputed  after  a  data  unit' is  assigned  and 
the  procedure  repeated  for  remaining  data  units.   After  all 
data  units  are  assigned,  the  entire  procedure  can  be  reapplied 

to  all  data  units  over  and  over  until  there  are  no  more 

17  ■ 
changes  in  cluster  memberships,  i.e.  until  convergence. 

There  are  more  complex  methods  than  the  centroid  methods 

for  clustering  data  units  and  these  are  based  on  multivariate 

statistical  analysis  techniques.   The  scatter  of  two  variables 

is  the  inner  product  of  two  centered  score  vectors.   The 

scatter  matrix  T  is  a  square  matrix  that  has  the  entry  t.. 

which  is  the  scatter  of  variables  i  and  j  computed  over  all 

the  data  units.   Each  of  the  h  clusters  has  its  own  scatter 

matrix  W  computed  over  the  data  units  in  the  k —  cluster. 

*  h 

The  within  groups  scatter  matrix  is  given  by  W  =   E  W  . 

k=l  K 
The  between  groups  scatter  matrix  is  denoted  by  B.   An 

h 
element  b.  .  =   Z  mi<-xikxik  where  mv  ^s  the  number  of  data 

units  in  the  k —  cluster,  x . .  is  the  mean  (centered  around 
the  grand  mean  in  the  entire  data  set)  of  the  i —  variable 
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31 


1_  t_ 

in  the  k —  cluster.   The  three  scatter  matrices  can  be 

■I  o 

shown  to  satisfy  the  relation  T  =  B  +  W. 

An  important  element  in  many  clustering  criteria  is 
the  determinant al  equation  |B  -  XW|  =  0.   The  eigenvectors 
of  the  matrix  W~  B  provides  the  X.  solutions  to  this  equation. 

D.  J.  McCrae  has  developed  a  FORTRAN  IV  computer  program 
called  K-MEANS  which  utilizes  these  concepts  to  cluster  the 
data  into  k  clusters.   He  provides  for  four  possible  criteria 
for  determining  when  assignment  of  a  data  unit  to  a  particu- 
lar cluster  results  in  the  "best"  partition  of  the  data  set. 
These  criteria  are:   (1)  minimize  the  trace  of  W;  (2)  maximize 
the  largest  eigenvalue  of  W~  B;  (3)  maximize  the  trace  of 
W~  B:  and  (4)  minimize  the  ratio  of  the  determinants  |W|/|T|. 
This  last  criterion  is  more  commonly  known  as  Wilk's  Lambda 
statistic.   Since  T  is  the  same  for  all  partitions,  this  is 

equivalent  to  minimizing  det  W.   The  last  procedure  was  the 

19 
one  used  to  cluster  the  data  units  in  this  particular  analysis. 

. McCrae' s  K-MEANS  also  allows  three  choices  of  diatnce 
measures  between  clusters.   These  are  Euclidian  distance, 
scaled  Euclidian  distance,  and  Mahalanobis  distance.   Assum- 
ing normal  populations,  N(6.,  £.),  with  equal  covariance 

J    <J 


l80p.  Cit.,  Anderberg,  M.R.,  p.  173-176 
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matrices  E   =  Z   =  ...  =  E  so  that  the  populations  differ 
only  in  location,  the  Mahalanobis  distance  between  the 

populations  is  given  by  D2  =  (B±  -   9  )T  Z"1(01  -  6.).   This 

20 
was  the  distance  measure  used  in  this  cluster  analysis. 

The  question  of  how  many  clusters  are  present  in  the 
data  was  mentioned  in  the  previous  section.   It  can  be  shown 
that  one  prime  indicator  of  the  discriminability  of  variables 
in  the  data  set  is  given  by  the  log  of  the  ratio  det  T/det  W. 
When  this  quantity  is  plotted  against  the  number  of  clusters 
one  can  gain  insight  as  to  the  appropriate  number  of  clusters 
within  the  data  set.   As  the  number  of  clusters  is  increased 
the  ratio  begins  to  reach  a  stabilizing  value  indicating 
that  the  discriminability  of  the  data  is  decreasing.   Thus, 
one  can  approximate  the  maximum  number  of  natural  clusters 
by  observing  when  the  curve  levels  off.   It  should  be 
reemphasized  that  it  is  a  primary  objective  of  most  cluster 
analysis  problems  to  produce  a  set  of  clusters  that  are  well 
differentiated  from  each  other. 

As  stated  before,  when  cluster  analyses  are  performed 
on  data  with  several  variables  actually  measuring  the  same 
characteristic,  it  might  be  profitable  to  reduce  the  problem 
to  one  of  only  a  few  primary  variables  by  the  techniques  of 
principal  components  analysis. 


200p.  Cit.,  Press,  S.J.  p.  372-323 
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For  this  analysis,  the  computer  program  BIMED01M  was 
utilized  to  extract  the  first  principal  component  from  the 
first  two  variables  and  the  first  principal  component  from 
the  fifth,  sixth,  seventh  and  eighth  variables.   BIMED01M 
performs  the  following  four  basic  steps:   (1)  the  data  are 
normed  and  centered;  (2)  the  correlation  matrix  of  the 
centered  and  normed  data  is  computed;  (3)  the  eigenvalues 
and  corresponding  eigenvectors  of  the  correlation  matrix 

are  calculated;  and  (4)  the  centered  and  normed  data  are 

21 
transformed  into  their  orthogonal  components. 


21 

BMP  Manual,  Biomedical  Computer  Programs,  Health 

Sciences  Computing  Facility,  UCLA,  University  of  California 
Press,  1973,  p.  193-201 
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VI.   RESULTS  OF  ANALYSIS 

The  two  data  groups,  control  and  accident,  were  first 
investigated  by  discriminant  analysis  with  the  use  of  the 
computer  program  BIMED04M. 

The  test  for  equality  of  group  covariance  matrices  (or 
equivalent ly,  group  dispersion  matrices)  was  performed 
according  to  the  procedure  developed  by  G.  E.  P.  Box  and 
illustrated  in  Appendix  D.   They  were  found  to  be  equal  at- 
the  .10  level  of  significance  so  it  was  appropriate  to  apply 
the  discriminant  analysis  procedures. 

Testing  for  the  equality  of  group  means,  BIMED04M 
computed  an  F  statistic  of  8.94.   For  the  a  =  .001  level  of 
significance,  the  tabled  F  value  is  Fni+np-m-l^1  ~  °^  = 

^0+50-10-l(1  "  *001)  =  F89(>999)  =  3*39  and  one  can  conclude 
that  there  is  definitely  a  difference  in  location  of  group 

means. 

The  computed  discriminant  function  coefficients  were 

(-0.00152,  0.00001,  -0.00035,  0.00360,  -0.00*988,  0.00685, 

0.00218,  -0.00191,  -0.00245,  0.00231).   If  after  applying 

the  coefficients  to  a  data  unit  vector  X. , 

J 

-0.00151x.j_  +  0.00001x.2  +  ...  +  0.00231X.  10  <_  0  then 
data  unit  j  is  assigned  to  group  number  two.  Otherwise, 
the  data  unit  is  assigned  to  group  number  one. 

Those  subjects  who  had  high  values  for  the  variables 
with  positive  coefficients  and  low  values  for  the  variables 
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with  negative  coefficients  were  classified  as  being  in  the 
control  group,  and  those  with  opposite  attributes  were 
classified  as  belonging  to  the  accident  group. 

The  discriminant  function  was  applied  to  the  original 
data  units  to  determine  the  performance  of  the  function. 
Fifteen  subjects  of  the  fifty  in  the  control  group  were 
classified  as  being  in  the  accident  group,  while  only  four 
of  the  fifty  in  the  accident  group  were  classified  as  being 
in  the  control  group.   It  is  important  to  observe  that 
although  the  overall  misclassification  rate  is  nineteen 
percent,  the  misclassification  rate  of  the  original  accident 
group  is  only  eight  percent.   This  is  encouraging.   The 
question  of  identifying  correctly  those  in  the  accident 
group  is  of  greater  concern  than  that  of  misclassifying 
those  individuals  in  the  control  group. 

To  obtain  the  preceding  results,  it  should  be  noted 
that  the  discriminant  analysis  was  performed  on  the  raw  data 
as  listed  in  Appendix  A.   An  analysis  was  also  performed  on 
the  standardized  data,  listed  in  Appendix  B,  but  the  results 
were  much  poorer.   Using  standardized  data,  the  overall 
misclassification  rate  was  fifty-five  percent,  quite  a  loss 
of  discriminating  power.   It  should  be  recognized  that 

standardizing  data  has  the  drawback  of  providing  answers  to 

22 

a  problem  different  than  the  one  originally  posed. 
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In  addition  to  learning  the  misclassification  rates,  it 
was  also  desired  to  determine  which  variables  had  the 
strongest  effect  on  classifying  the  data  units  and  in  which 
direction  the  effect  was  observed.   The  discriminant  function 
coefficients  indicate  whether  each  variable  has  a  positive 
or  negative  effect,  but  because  of  the  difference  In  magni- 
tudes of  the  variables,  the  discriminant  function  coefficients 
alone  do  not  tell  how  much  of  an  effect.   It  Is  of  interest, 
therefore,  to  compare  how  much  a  one  standard  deviation 
change  in  each  variable  will  affect  the  discriminant  function. 
Table  I  presents  the  standard  deviations  of  each  variable  in 
the  second  column,  the  discriminant  function  coefficients  in 
the  third  column,  and  in  the  last  column  the  effect  on  the 
discriminant  function  of  a  one-sigma  change  in  each  variable. 

TABLE  I 


Variable 

Standard 

Disc.  Funct . 

Effect  of  a 

Deviation 

Coefficient 

la  Change 

1     1 

6.03 

-0.00152 

-0.00916 

2 

1390.00 

0.00001 

0.01390 

3 

28.20 

-0.00035 

-0.00987 

4 

2.99 

0.00360 

0.01080 

5 

1.42 

-0.00988 

-0.01400 

6 

1.78 

0.00685 

0.01219 

7 

0.68 

0.00218 

0.00148 

8 

1.03 

-0.00191 

-0.00248 

9 

5.1^ 

-0.00245 

-0.01259 

10 

2.45 

0.00231 

0.00565 
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The  results  of  Table  I  Indicate  that  variable  two  (total 
hours)  has  the  strongest  positive  effect  In  classifying  a 
subject  as  not  being  In  the  accident  group.   A  surprising 
result,  however,  Is  that  variable  five  (time  flown  In  the 
last  twenty-four  hours)  has  the  strongest  negative  effect 
while  variable  six  (time  flown  in  the  last  forty-eight  hours) 
has  a  strong  positive  effect.   This  would  suggest  that  flying 
every  other  day  is  beneficial,  but  that  too  much  flying  (i.e. 
everyday)  is  detrimental.   Similar  interpretations  can  be 
made  for  the  remaining  variables  although  their  effects  are 
less  pronounced. 

Although  an  overall  misclassification  rate  of  nineteen 
percent  tends  to  indicate  that  there  are  meaningful  differ- 
ences between  the  two  groups,  the  classification  capabilities 
of  the  discriminant  function  are  not  as  sharp  as  one  would 
like.   One  cannot  say  with  assurance  how  a  pilot  not 
initially  a  member  of  either  group  should  be  classified. 
It  was  desired  to  learn  more  about  the  variables'  effects 
to  be  able  to  apply  conclusions  to  subjects  beyond  the  range 
of  the  data.   To  do  this,  a  second  method  of  analysis  was 
employed;  that  of  cluster  analysis. 

The  first  type  of  cluster  analysis  used  was  hierarchical 
clustering  by  data  units.   The  computer  program  HI-CLUST  was 
used  with  Euclidean  distance  measure  between  data  units  as 
the  indicator  of  association.   The  results  from  both  the 
single  linkage  and  complete  linkage  methods  were  not  at 
all  satisfactory.   When  clustered  into  the  final  two  groups, 
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one  cluster  consisted  of  ninety-nine  units  and  the  other  of 
a  single  unit.   The  cluster  of  ninety-nine  units  was  composed 
of  clusters  of  ninety-three  units  and  six  units;  again 
shedding  no  light  on  relation  to  accidents.   Therefore,  the 
clustering  on  data  units  was  reworked  using  the  non- 
hierarchical  techniques  of  the  computer  program  K-MEANS. 

Initially,  the  one  hundred  data  units  were  clustered 
into  two  groups  to  ascertain  if  there  was  any  association 
directly  with  the  two  original  groups,  control  and  accident. 
Unfortunately,  there  did  not  appear  to  be  any  association, 
as  cluster  number  one  contained  thirty-three  subjects  from 
the  control  group  and  thirty-seven  from  the  accident  group 
while  cluster  number  two  had  seventeen  and  thirteen, 
respectively. 

Figure  (3)  graphically  depicts  the  cluster  means  of  the 
two-group  cluster  results,  and  the  number  of  subjects  in  the 
clusters.   It  is  interesting  to  note  that  fifty-three  percent 
of  cluster  number  one  was  composed  of  subjects  from  the 
accident  group  while  only  forty-three  percent  of  cluster 
number  two  was  from  the  accident  group.   By  inspecting  the 
cluster  means  of  variables  one  and  two,  one  can  see  that  the 
cluster  compositions  are  inversely  related  to  total  experience, 
i.e.  cluster  number  one  has  higher  accident  composition  and 
fewer  years  designated  naval  avaiator  and  fewer  total  hours. 
The  same  kind  of  relation  is  seen  to  apply  to  the  recency  and 
frequency  variables  (variables  four  through  ten)  but  the 
separation  is  not  as  great.   Cluster  number  one  which  has  the 


39 


Years  DNA 


*1 1 


12 


-*■ 


16 


20 


Total  Hours 
Logged 


Hours  Flown   i 
Last  90  Days  JT 

Elapsed  Time 
Since  Last    |- 
Hop  (Days)   0 

Time  Flown  ini 
Last  24  Hours' 


Time  Flown  ini 
Last  4  8  Hours; 


No.  Missions 
Last  24  Hours, 


No.  Missions 
Last  48  Hours, 


— t-* 

1000 


-x- 


2000 


3000 
C, 


4000 


20 


40 


-36- 


1; 


8o 


.4 


.8 


.4 


.4 


Cl  C2 
— x  x 


c2  Cl 


.8 


-X X- 


1.2 


1.6 


c2  c, 


1.6 


-X X- 


2.4 


3.2 


c2  Cl 


4-** h 


.8 


.8 


1.2 
;2 

^4- 


1.6 


-35- 


1.2 


1.6 


5000 


100 


10 


2.0 


4.0 


2.0 


2.0 


Carrier  Lndgs  i_ 
Last  30  Days  J" 

Night  Carrier 
Lndgs  Last    \- 
30  Days      0 


Accident 
Percentage 


'2 
-x- 


c2  Cl 


-35- 


20 


tr 


c2  cx 


i 


Control  Group 
Accident  Group 
Total  . 


40       60 
Cluster  Means 


80 


Cluster  One 


Cluster  Two 


Number  in  Clusters 


Figure  (3) 


10 


10 


100 


Total 


33 

17 

50 

37 

13 

50 

70 

30 

40 


higher  accident  composition  has  cluster  means  which  indicate 
more  recent  and  frequent  flying  than  the  subjects  of  cluster 
number  two.   Again,  the  results  here  are  in  basic  agreement 
with  those  of  discriminant  analysis  in  that  they  indicate 
less  frequent  flying  is  beneficial.   But  of  course,  the 
support  is  very  thin  and  the  results  are  far  from  conclusive, 
especially  since  the  cluster  means  are  seen  to  be  relatively 
close  for  all  of  variables  four  through  ten. 

It  was  stated  in  Section  V  that  it  is  possible  to  get 
a  rough  idea  of  the  number  of  natural  clusters  present  in 
the  data  by  plotting  log(det  T  /  det  W)  versus  the  number  of 
clusters.   Figure  (4)  is  a  plot  of  this  information  for  the 
data  under  study.   As  the  number  of  groups  is  increased  the 
curve  begins  to  level  off.   It  appears  that  beyond  nine  groups 
there  is  not  much  additional  information  to  be  gained  by 
grouping  further. 

The  primary  interest  lies  in  the  analysis  of  two  groups, 
since  there  were  two  groups  initially,  and  in  the  analysis 
of  the  natural  number  of  groups.   Between  two  and  nine  groups 
the  results  are  believed  to  be  less  useful. 

Figure  (5)  graphically  portrays  the  cluster  means  of  the 
nine  cluster  results,  and  the  number  of  data  units  in  each 
cluster.   The  relationships  among  clusters  here  are  not 
apparent  and  there  is  no  one-to-one  correspondence  such  as 
an  inverse  relation  between  the  cluster  means  of  total  hours 
flown  and  composition  of  clusters  by  accident  percentages. 
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It  is  desirable,  therefore,  to  plot  the  proportion  of  each 
cluster  from  the  accident  group  versus  the  cluster  means 
for  each  variable.   By  so  doing,  trends  might  appear  and 
factors  influencing  the  accident  proportions  might  become 
more  readily  observable.   These  plots  are  depicted  in  Appendix 
E  as  Figures  (10)  through  (19).   Figures  (10)  through  (19) 
are  similar  in  that  none  of  them  reveal  any  prominent 
relationships  that  their  respective  variables  have  with  the 
proportion  of  the  clusters  composed  of  accident  subjects. 
Intuitively,  one  might  have  hypothesized  that  as  the  cluster 
means  increased  (as  in  Fig.  (11)  for  instance)  that  the 
proportion  of  the  clusters  composed  of  accident  units  would 
decrease.   Since  this  kind  of  relationship  did  not  appear 
for  the  total  hours  variable,  nor  did  similarly  anticipated 
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relations  hold  for  the  other  variables,  a  final  type  of 
analysis  was  performed  on  the  data. 

It  seemed  plausible  that  although  the  variables  individ- 
ually did  not  reflect  the  contributions  they  had  upon 
accidents,  certain  variables  collectively  might  demonstrate 
such  an  effect.   To  determine  which  variables  to  combine,  a 
hierarchical  clustering  analysis  was  performed  by  the  computer 
program  HI-CLUST.   Product-moment  correlation  was  used  as 
the  association  measure  between  variables.   Both  the  methods 
of  single  linkage  and  complete  linkage  clustering  as  discussed 
in  Section  V  were  employed.   The  results  are  shown  as 
hierarchical  trees  (dendrograms)  in  Figures  (6)  and  (7). 

The  results  of  both  hierarchical  methods  are  similar. 
Variables  number  one  and  two  are  highly  correlated  and 
variables  five,  six,  seven  and  eight  are  highly  correlated. 
Therefore,  it  was  decided  to  combine  those  respective  variables, 
calling  the  first  the  experience  variable  and  the  second  the 
frequency  variable.   In  order  to  eliminate  all  unnecessary  or 
distracting  influences  it  was  also  considered  prudent  to 
eliminate  variables  nine  and  ten  since  very  few  accidents 
involved  carrier  landings  and  many  subjects  in  both  groups 
were  not  involved  in  carrier  operations  during  the  period 
investigated. 

As  discussed  in  Sections  IV  and  V,  BIMED01M  was  used  to 
extract  the  first  principal  components  from  those  combina- 
tions of  variables  listed  above  to  obtain  the  total 
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Hierarchical  Tree  for  Variables 
by  Connections  or  Single  Linkage  Method 


Variables 

5 

6 

7 

8 

3 

9 

10 

1 

2 

H 


1.0  0.9  0.8  0.7  0.6  0.5  0.4  0.3  0.2  0.1  0.0  -0.1 
Product -Moment  Correlation 


VARIABLE  DEFINITIONS 

1  -  Years  DNA  6 

2  -  Total  Hours  Logged  7 

3  -  Hours  Flown  in  Last  90  Days  8 

4  -  Days  Since  Last  Flight  9 

5  -  Time  Flown  in  Last  24  Hours  10 


Time  Flown  in  Last  48  Hours 

Missions  in  Last  24  Hours 

Missions  in  Last  48  Hours 

Car.  Lndgs.  Last  30  Days 

Night  Car.  Lndgs.   in  Last  30  Days 


Figure    (6) 


45 


Hierarchical  Tree  for  Variables 
by  Diameter  or  Complete  Linkage  Method 
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experience  variable  and  the  frequency  variable.   The  principal 
components  (exhibited  in  Appendix  G)  were  extracted  from  the 
standardized  data  (exhibited  in  Appendix  B)  as  required  by 
BIMED01M. 

With  the  data  reduced  to  four  main  variables,  the  cluster 
analysis  program  K-MEANS  was  again  used  to  investigate  the 
data.   As  was  done  with  the  data  in  ten  variables  (or 
regular  space)  the  data  was  first  investigated  by  clustering 
into  just  two  groups.   The  cluster  means  are  depicted  in 
Figure  (8).   As  was  true  in  the  regular  space  analysis,  there 
does  not  appear  to  be  any  association  between  clustering  of 
data  units  and  membership  in  the  accident  group.   Again, 
there  were  thirty-three  subjects  from  the  control  group  and 
thirty-seven  from  the  accident  group  in  cluster  number  one, 
and  seventeen  and  thirteen  respectively  in  cluster  number 
two.   Thus,  fifty-three  percent  of  cluster  number  one  was 
from  the  accident  group  and  forty-three  percent  of  cluster 
number  two  was  from  the  accident  group. 

The  graph  of  log  (det  T  /  det  W)  versus  number  of 
clusters  was  plotted  for  the  reduced  space  analysis  in 
Figure  (9)  and  was  also-found  to  indicate  that  beyond  nine 
clusters,  minimal  information  is  gained.   Therefore,  a  plot 
of  the  proportion  of  clusters  from  the  accident  group  versus 
the  cluster  means  was  constructed  for  each  of  the  four 
variables  in  the  reduced  space  with  nine  cluster  groupings. 
Figures  (20)  through  (23)  in  Appendix  H  are  graphs  of  the 
results. 
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Figure  (8) 
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Figure  (9) 

The  resultant  plots  in  Appendix  H  are  not  too  informative 
Three  of  the  four  "new"  variables  do  not  appear  to  reveal  any 
structure;  but  the  first,  the  experience  variable,  may  have 
some  interest.   A  parabolic  fit  has  been  drawn  in  freehand 
and  the  accident  rate  seems  to  bottom  out  for  experience  in 
the  interval  (-1,0).   This  is  misleading  however.   The 
interval  (-1,0)  of  the  experience  variable  corresponds  to 
values  of  X,  and  X?  which  are  between  modes  of  their  respec- 
tive distributions.   Only  seven  of  the  one-hundred  aviators 
are  in  this  range. 
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VII.   CONCLUSIONS 

The  three  analytical  methodologies  employed  in  this 
investigation  were  primarily  utilized  as  exploratory  tools 
to  determine  if  there  were  significant  differences  in  the 
various  flight  time  statistics  recorded  for  sample  groups 
of  pilots  with  and  without  accidents.   The  discriminant 
analysis  techniques  provided  the  best  indication  that  there 
were  differences  which  could  be  used  to  categorize  the 
pilots  according  to  the  probability  of  belonging  to  the 
accident  group. 

It  should  be  recognized  that  failure  to  distinguish 
among  pilots  according  to  their  flight  statistic  attributes 
is  not  necessarily  a  fault  of  the  analytical  procedures,  but 
inherent  inability  of  the  data  as  currently  conceived  to 
discriminate  among  subjects.   This  does  not  suggest,  however, 
that  this  approach  to  accident  analysis  has  no  merit.   It 
does  point  out  the  need  to  expand  the  investigation  to  in- 
clude more  quantitative  aspects  of  flying.   Many  other 
variables  such  as  Instrument  time,  synthetic  trainer  time, 
number  of  instrument  approaches,  average  time  spent  briefing 
flights,  and  subjective  attributes  such  as  training  command 
flight  grades  and  NATOPS  quiz  grades  could  be  included. 
Breaking  the  investigation  down  into  many  more  restrictive 
areas  such  as  including  only  accidents  in  a  particular  phase 
of  flight,  or  including  only  accidents  by  a  particular  type 
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of  aircraft  such  as  attack  or  patrol  might  also  prove  to 
be  more  relevant.   It  should  be  informative  to  expand  the 
time  span  of  the  data  base  to  include  five  or  ten  years  so 
as  to  have  a  larger  sample  size  on  which  to  base  results. 
Also,  enlarging  the  size  of  the  control  group  would  help 
to  eliminate  the  effects  of  non-randomness  which  could  bias 
the  data. 

Despite  the  fact  that  the  data  investigated  in  this 
analysis  did  not  contain  those  characteristics  which  could 
identify  the  underlying  accident  generating  mechanism,  it 
is  still  considered  worthwhile  to  pursue  the  basic  ideas 
developed  here  in  future  accident  analysis. 
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APPENDIX   A 


TABLE 

11  - 

Control  Group  Raw 

Data 

Variables 

Obs. 

1 

2 

3 

h 

5 

6 

7 

8 

9 

10 

1 
2 

I 

5 
6 

1 

0137 

10 

10 

0.5 

0.7 

OA 

0.6 

00 

00 

15 

3398 

12 

01 

3.3 

5.7 

1.5 

2.5 

01 

01 

16 

0589 

36 

05 

0.9 

1.8 

0.5 

0.9 

CO 

00 

12 

1+9^ 

35 

oh 

0.5 

0.9 

oA- 

0.8 

00 

00 

16 

*A66 

18 

Ch 

0.6 
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1.6 
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03 

00 

A 
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18 

05 

0.3 
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0.3 

0.5 

00 

00 

7 

5 
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32 

05 

0.9 

1.6 

0.5 

0.7 

00 

00 

8 

8 

1621 

A 

09 

0.9 

1.3 

0.5 

0.6 

00 

00 

9 

h 

1539 

h7 

6h 

0.9 

1.9 

0.7 

1.0 

00 

00 

10 

2 

0968 

30 

06 
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0.5 

00 

00 

11 

1 

0320 
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0.7 
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0.8 

lA 

00 
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12 

3 

1532 

ch 

17 
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0.3 

0.6 

00 

00 

13 

3 
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\2 

03 

l.l 

1.8 
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03 

01 

A 

2 
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00 
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15 

1 
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36 

03 
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1.1 

1.6 

00 

00 

16 

1 
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17 
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18 
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00 
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0.9 

00 
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20 

16 
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00 
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21 

A 
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00 
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22 
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23 

26 
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29 
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00 

00 

25 
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03 
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26 

h 
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27 
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18 
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28 
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36 

O^f 
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TABLE  II  (Continued) 

Obs. 
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0.6 
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TABLE    II] 

'.   -  Ace 

:ident 

Group 

Raw  Data 

Vai 

•iables 

Obs. 
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8 
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APPENDIX   B 


TABLE  IV  - 

-  Control 

Group  Standardized 

Data 

Variables 

Obs. 

1 

2 

3 

4 

5 

1 

-1.0761 

-I.3095 

-1.1829 

1.1957 
-1.1437 

-0.5698 

2 

1.1200 

O.8506 

-1.0929 

4.3852 

i 

1.2769 

-1.0101 

-0.0126 

-o.io4o 

O.1380 

0.6494 

1.8721 

-0.0576 

-0.36^9 

-0.5698 

5 

1.2769 

1.5581 

-0.0576 

-O.3639- 

-0.3929 

6 

0.9631 

0.9593 

-0.8228 

-0.1040 

-0.9238 

7 

-0.4486 

-O.2867 

-0.1926 

-o.io4o 

0.1380 

8 

0.0220 

-0.3265 

-I.0028 

0.9358 

O.1380 

9 

-0.6055 

-O.3808 

0.4825 

-0.3639 

0.1380 

10 

-0.9192 

-0.7591 

-0.2827 

0.1560 

-0.5698 

11 

-1.0761 

-I.I883 

0.2124 

-0.6239 

-0.2159 

12 

-0.7624 

-O.3855 

-1.4529 

3.0153 

-O.9238 

11 

-0.7624 

-0.8684 

0.2575 

-0.6239 

0.4920 

-0.9192 

-0.0999 

4. 3804 

-O.8838 

1.9077 

15 

-1.0761 

-1.1850 

-0.0126 

-0.6239 

-0.3929 

16 

-1.0761 

-1.2115 

0.3025 

-O.8838 

O.1380 

17 

-1.0761 

-1.2764 

-0.1926 

-0.6239 

-0.2159 

18 

-0.9192 

-1.2148 

0.3375 
1.2477 

-O.8838 

-O.0389 

19 

1.5906 

1.3382 

-0.6239 

-O.0389 

20 

1.2769 

1.1838 

-0.1026 

-0.6239 

-0.3929 

21 

0.9631 

0.9507 

-1.0478 

1.9755 

-1.1007 

22 

0.3357 

0.8692 

-0.5527 

-o.io4o 

-0.5698 

2? 

2.8455 

2.3040 

0.3475 

-0.6239 

-0.2159 

24 

-0.6055 

-0.0569 

-0.3277 

0.1560 

-O.0389 

25 

-1.0761 

-1.0644 

0.4375 

-0.6239 

0.1380 

26 

-0.6055 

-0.1563 

0.4375 

-0.1040 

-0.7468 

27 

-0.1349 

-0.0549 

-O.8228 

1.1957 

-O.0389 

28 

-0.6055 

-0.6266 

-0.0126 

-0.3639 

-0.2159 

29 

-0.6055 

0.2710 

0.7076 

-0.3639 

-O.0389 

1   30 

-0.7624 

-0.5564 

0.4375 

-0.3639 

-0.7468 

31 

-0.9192 

-0.6670 

0.8876 

-0.3639 

-0.3929 

32 

-1.0761 

-1.1804 

0.0324 

-0.6239 

-0.2159 

33 

-0.9192 

-0.3994 

1.4727 

-O.8838 

-0.0389 

3^ 

-0.4468 

0.1538 

-0.4672 

0.6758 

-0.2159 

35 

1.5906 

0.7923 

0.1224 

-0.3639 

1.0229 

36 

0.6494 

0.4538 

-0.7328 

-0.3639 

-0.9238 

37 

0.9631 

0.7^33 

-1.0028 

-0.6239 

-0.9238 

38 

0.8063 

0.6711 

-1.2279 

3.0153 

-1.2777  . 

11 

0.6494 

-1.0022 

-0.2377 

0.1560 

1.1998 

0.3357 

0.9613 

-0.5527 

-0.1040 

O.138O 
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TABLE   IV   (Continued) 


k 

Variables 

Obs. 

6 

7 

8 

9 

10 

1 

-0.8123 

-O.68V7 

-0.7165 

-0.3559 

-0.3736 

2 

3.6807 

2.3281 

2.5095+ 

0.2181 

0.9608 

I 

0.1761 

-0.5+108 

-0.2071 

-0.'3559 

-0.3736 

-0.6326 

-0.68^+7 

-0.3769 

-0.3559 

-0.3736 

5 

0.0863 

2.6020 

1.6605 

1.3660 

-0.3736 

6 

-0.9921 

-0.9586 

-0.8862 

-0.3559 

-0.3736 

7 

-O.OO36 

-0.5+108 

-0.5^7 

-0.3559 

-0.3736 

8 

-0.2732 

-0.5+108 

-0.7165 

-0.3559 

-0.3736 

9 

-    0.2660 

0.1370 

-0.0373 

-0.3559 

-Oo3736 

10 

-0.3630 

-0.9586 

-0.886? 

-0.3559 

-0.37^6 

n 

-O.I833 

0.5+109 

0.65+18 

-0.3559 

-0.3736 

12 

-I.I718 

-0.9586 

-0.7165 

-0.3559 

-0.3736 

13 

0.1761 

0.685+8 

0.5+720 

I.3660 

O.9608 

li 

3.231*+ 

1.7803 

2.3396 

-0.3559 

-0.3736 

15 

-O.OC36 

1.2325 

0.9813 

-0.3559 

-0.3736 

16 

0.629+ 

0.9586 

1.3209 

-0.3559 

-0.3736 

17 

-0.2732 

0.9586 

0.1+720 

-0.3559 

-0.3736 

18 

-0.093^ 

1.2325 

0.8116 

-0.3559 

-0.3736 

19 

0.5356 

-0.1369 

-0.2071 

-0.3559 

-0.3736 

20 

-0.3630 

0.1370 

0.1325+ 

-0.3559 

-  -0.3736 

21 

-1.1718 

-1.2325 

-1.3956 

-0.3559 

-0.3736 

22 

-O.6326 

-0.66^-7 

-0.7165 

0.2181 

O.96O8 

23 

0.1761 

0.1370 

1.3209 

-0.3559 

-0.3736 

2if 

O.0863 

-OAlOS 

-0.2071 

-0.3559 

-0.3736 

25 

-0.0036 

1.2325 

1.3209 

1.3660 

0.9608 

26 

-O.8123 

-1.2325 

-1.2258 

-0.3559 

-0.3736 

27 

O.0863 

-0.68^+7 

-0.3769 

-0.3559 

-0.3736 

28 

0.2660 

0.9586 

1.8302 

-0.3559 

y  -0.3736 

29 

-0.2732 

-0.5+108 

-0.55+67 

-0.3559 

-0.3736 

30 

-0.9022 

-1.2325 

-1.2258 

-0.3559 

-0.3736 

31 

-0.2732 

-0.9586 

-0.8862 

-0.3559 

-0.3736 

32 

-0.^+529 

0.685+8 

0.5+720 

-0.3559 

-O.3736 

33 

-0.1833 

0.681+8 

0.3022 

-0.3559 

-0.3736 

3>+ 

0.2660 

-0.9586 

-0.8862 

-0.3559 

-0.3736 

35 

0.98^f9 

0.685+8 

0.9813 

0.2181 

-O.3736 

36 

-1.0819 

-0.685+7 

-0.8862 

-0.3559 

-0.3736 

37 

-0.9022 

-0.5+108 

-0.55+67 

-0.3559 

-0.3736 

38 

-1.2616 

-1.5065+ 

-1.3956 

-0.3559 

-0.3736 

39 

0.7153 

0.9586 

0.65+18 

-0.3559 

-0.3736 

ho 

0.0863 

-0.5+108 

-0.2071 

-0.3559 

-0.3736 
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TABLE   IV   (Continued) 


Variables 

Obs. 

1 

2 

3 

if 

5 

hi 

*+2 

^3 
M+ 

h5 
h6 
h7 
hS 
h9 
50 

-0.2918 

-O.M+86 

-0.762*+ 

-0.762*+ 

-0.9192 

-0.762*+ 

1.1200 

1.2769 

1.2769 

1.^337 

0.370*+ 
-0.6060 
-0.6*+31 
-0.93W 
-0.7^79 
-0.2285 

1.3633 
-1.0505 

1.8078 

1.9933 

1.7878 
-1.0028 

0.077^ 
-1.1829 

0A375 

: 1.1127 

-0.2377 

-i.oi+78 

2.0579 

-0.5977 

-0.8838 

2.*+951+ 
-O.IO^O 

1.1957- 

-0.10*+0 

-0.8838 
-0.3639 

1.9755 

-0.8838 
-0.10M-0 

2.6156 

-l.h^h? 

-O.9238 

-0.5698 
-0.5698 

O.6689 
1.1998 

O.6689 
1.3768 

-O.0389 
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TABLE   IV   (Continued) 


Variables 

Obs. 

6 

7 

8 

9 

10  - 

i 

hi 
!       1*2 

V-5 
lf6 

if8 

*f9 

50 

2.1531 

-l.M+ll* 

-1.0819 

-0.6326 

-0.7225 

0.2660 

.     1.3^3 

0.625V 

1.161*6 

-O.OO36 

1.7803 
-1 . 506V 
-1.5061* 
-0.9586 

0.1370 

-O.1369 

0.68lf8 

-0.1369 
0.  If  109 

-O.I369 

1.1*907 

-l.5651+ 
-1.565^ 
-0.8862 
-0.2071 
-0.2071 
0.61*18 

-0.0373 

0.1321* 

-0.2701 

5.3837 
-0.3559 
-0.3559 
-0.3559 
-0.3559 

0.7921 
-0.3559 
-0.3559 

3.0879 

0.2181 

3-6296 
-O.3736 
-0.37^ 
-0.3736 
-0.3736 

3.6296 
-0.3736 
-0.3736 

3.6296 

0.9608 
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TABLE   V  -  Ace 

ldent   Group   Standardized  Da 
Variables 

ta 

Obs. 

1 

2 

3 

h 

5 

1 

-0.9374 

-1.0456 

0. 14-1 3 

-0.4195 

-0.8172 

2 

-0.0531 

0.4047 

1.7109 

-0.4195 

0.0621 

i 

-0.2299 

0.0300 

0.5^35 

1.8479 

-0.5069 

-0.5836 

-0.6384 

0.1+^+31 

-OA195 

-0.9724 

5 

-0.9374 

-I.0329 

-0.2813 

-0.4-195 

-0.5793 

6 

-0.9374 

-1.2480 

0.14-13 

-0.4195 

-0.6103 

7 

1.0081 

0.9032 

-0.854-8 

-0.9863 

0.8379 

8 

-0»5836 

-0.4772 

0.0809 

-0.4195 

-0.9724 

9 

2.7767 

1.4009 

-I.7006 

-0.9863 

-0.4034- 

10 

0.1238 

O.8238 

0.9261 

1.2810 

-0.9724- 

11 

-0.2299 

0.001*+ 

0.5639 

-O.9863 

0.94-14- 

12 

-0.2299 

-0.8479 

1.0166 

-0.4195 

0.4241 

K 

0.1238 

-0.2065 

0.1715 

-0.4195 

0.7862 

-0.9374- 

-1.04-56 

-0.5832 

-O.9863 

-0.1965 

15 

1.0081 

1.1762 

1.8014- 

0.7142 

1.4586 

16 

-0.9374- 

-0.6376 

1.4090 

-0.9863 

1.2517 

17 

-0.5836 

-0.1+606 

0.9864 

2. 9815 

-0.9724- 

18 

-0.9374- 

-1.0837 

-0.2813 

-0.4195 

-0.4552 

19 

-0.5836 

-0.5058 

1.7713 

-0.4195 

0.0103 
-0.9724 

20 

-0.7605 

-0.8281 

0.4-733 

0.7142 

21 

2.0693 

1.8089 

0.1111 

0.7142 

-0.9724- 

22 

-0.4-068 

-0.4-590 

-0.9756 

0.7142 

3.8378 

11 

0.4-775 

0.7^+28 

-1.0058 

' -0.419 5 

-0.0931 

-O.5836 

-0.4513 

0.3827 

-0.4195 

0.7862 

25 

-0.0531 

0.0316 

-0.4926 

0.1474- 

-0.9724- 

26 

-0.9374 

-1.004-3 

0.5940 

-0.4195 

0.8379 

27 

-0. ^ 06 8 

-0,3820 

-0.1304 

0.1474 

-0.o4l4- 

28 

-0.5836 

-0.4-709 

1.1374 

0.7142 

-0.9724 

29 

1.0081 

1.75^9 

-1.3982 

0.714-2 

-0.9724- 

30 

2.0639 

1.84-06 

-0.0097 

3-54-84- 

-0.9724 

31 

-0.2299 

-0.0652 

-1.4-284 

-O.9863 

0.1183 

32 

1.0081 

1.24-05 

-1.6095 

-0.4-195 

-0.3000 

33 

2.0693 

2.1740 

O.3827 

0.7142 

-0.0724 

34- 

-0.2299 

-0.1010 

2.5862 

-0.4195 

0.7345 

35 

-1.1142 

-1.1059 

-1.4284 

-0.4195 

-0.4-552 

36 

-0.5836 

-0.9090 

0.4733 

0.7142 

-0.9724- 

37 

-0.2299 

-0.6796 

-0.9152 

-0.4195 

1.2000 

38 

1.0081 

1.3850 

0.2620 

-0.4-195 

0.7345 

8 

1.0081 

1.6271 

1.0770 

-0.4-195 

2.4413 

• -0.5836 

-0.6074- 

-0.7341 

2.414-7 

-0.9724 

J*1 

-0.5836 

-0*2613 

-1.24-72 

-0.4195 

-0.4-552 

42 

-0.2299 

-0.2105 

0.7450 

-0.4-195 

1.2000 

h3 

2.0693 

2.3931 

0.1111 

-0.4195 

-0.0931 

44 

-0.22Q9 

-0.1764 

-0.8548 

-0.4-195 

0.3207 

^5 

-0.7605 

-0.9035 

0.2016 

0.1474- 

-0.9724 

46 

-0.2299 

-0.1891 

-1.5793 

-0.9863 

0.7345 

^7 

2.0693 

0.9659 

-1.4284- 

-0.4195 

-0.6620 

4-8 

-0.9^74 

-0.9598 

-0.5228 

0.714-2 

-0.9724- 

49 

-0.5836 

-0.5344 

-0.7945 

-O.9863 

0.0621 

50 

-0.9374 

-1.1757 

0.0507 

-O.9863 

0.3207 
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TABLE  V   (Continued) 


Variables 

Obs. 

6 

7 

8 

9 

10 

1 

-0.5903 

-0.11^0 

-0.5100 

0.3115 

-0.385+2 

2 

-0*32  52 

-0.115+0 

-0.5100 

-0.3965 

-0.385+2 

I 

-O.8II3 

-0.115+0 

-0.5100 

-0.. 9628 

-0.6797 

1.0888 

-1.2536 

0.9901 

-0.8212 

-0.6797 

5 

l.M+23 

1.0256 
-0.11M-0 

1.7V01 

-0.9628 

-0.6797 

'     6 

-0.8997 

-0.5100 

-0.9628 

-0.6797 

7 

0.9120 

1.0256 

0.9901 

-0.1133 

0.2069 

8 

-0.988O 

-1.2536 

-0.5100 

0.59V7 

-0.0887 

9 

00.3376 

-0.115+0 

0.2!+00 

-0.9628 

-0.6797 

10 

-1.2090 

-1.2536 

-1.2601 

1.5858 

-0.6797 

11 

1.2213 

1.0256 

1.75+01 

l.W+2 

1.3890 

12 

0.1608 

-0. 115+0 

0.9901 

V. 2761 

5+.05+87 

K 

1.1772 

1.0256 

0.9901 

1.1611 

-0.6797 

0.1167 

-0.11V0 

0.25+00 

1.727V 

1.685+5 

15 

0.8678 

-0.11V0 

-0.5100 

-0.3965 

1.0935 

16 

1.7958 

1.0256 

0.9901 

0.7363 

0.7979 

17 

-1.2090 

-1.2536 

-1.2601 

1.3026 

2.8666 

18 

-0.7671 

1.0256 

0.9901 

-0.1133 

-0.6797 

19 

-0.369^ 

-0.115+0 

-0.5100 

0.0283 

0.5025+ 

20 

-0.7671 

-1.2536 

-0.5100 

-0.1133 

-0.0887 

21 

-0. 635+5 

-0.115+0 

-0.5100 

-0.1133 

-0.6797 

22 

2.9005 

-0.11.5+0 

-0.5100 

-0.9628 

-0.6797 

22 

0.20^0 

1.0256 

1.7^01 

-0.9628 

-0.6797 

2h 

1.2655 

1.0256 

0.9901 

0.5+531 

0.7979 

25 

-O.3252 

-1.2536 

-0.5100 

-O.255+9 

-0.6797 

26 

0.5585 

1.0256 

0.9901 

0.5+^31 
0.5W 

-0.8807 

27 

0.0725 

-o.im-o 

0.25+00 

-0.6797 

28 

-1.2090 

-1.2536 

-1.2601 

0.3115 

1.0935 

29 

-1.2090 

-1.2536 

-1.2601 

-0.9628 

-0.6797 

30 

-1.2090 

-1.2536 

-1.2601 

-0.9628 

-0.6797 

31 

0.5+260 

1.0256 

0. 2V00 

-0.9628 

-0.6797 

32 

-0.63^+5 

-0.115+0 

-0.5100 

-0.9628 

-0.6797 

33 

-1.2090 

-1.2536 

-1.2601 

0.3115 

0.502^+ 

3^ 

1.7516 

2.1653 

3.2^+02 

-0.9628 

-0.6797 

35 

-0.7671 

-0.115+0 

-0.5100 

-0.9628 

-0.6797 

36 

-1.2090 

-1.2536 

-1.2601 

-0.255+9 

-0.385+2 

37 

0. 61+69 

1.-0256 

0.25+00 

^-0.8212 

-0.6797 

38 

0. 21+92 

-0.115+0 

-0.5100 

0.7363 

0.5025+ 

39 

1.707V 

2.1653 

0.9901 

0.0283 

0.5025+ 

ho 

-1.2090 

01.2563 

-1.2601 

0.8779 

1.0935 

■  hi 

-0.7671 

-0.115+0 

-0.5100 

-0.1133 

-0.385+2 

lf2 

0.6^69 

1.0256 

0.25+00 

0.5+531 

-0.385+2 

^3 

-OA578 

-0.115+0 

-0.5100 

0.595+7 

1.0935 

M+ 

-0..I0U3 

-0.111+0 

-0.5100 

-0.9628 

-0.6797 

5+5 

-0.0159 

-1.2536 

-0.5100 

0.5+531 

1.0935 

h6 

0.25+92 

2.1653 

0.9901 

-0.9628 

-0.6797 

h7 

-0.9^38 

-0.115+0 

-0.5100 

-n.9628 

-0.6797 

5+8 

-1.2090 

-1.2536 

-1.2601 

0.59V7 

-0,6387 

>+9 

0.7795 

-0.11M-0 

0.25+00 

-0.1133 

-0.6797 

50 

-    0.5+702 

1.0256 

0.9901 

-0.9628 

-0.6797 
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APPENDIX  C 
DISPERSION  MATRICES 

An  element  of  the  group  dispersion  matrix  is  given  by 
the  formula 


1    50 

£   (x.,  -  x.  )  (x.,  -  x. ) 


N  -  1  kI1  v  ik   Ai'x  jk   Aj 


where  i  =  1,2,... ,10  and  j  =  1,2,..., 10.   An  element  of 
the  dispersion  matrix  is  readily  seen  to  differ  from  an 

element  of  the  covariance  matrix  only  by  the  factor  ^ y 

where  N  is  the  number  of  data  units  or  observations. 

An  element  of  the  pooled  within  groups  dispersion  matrix 
is  given  by  the  formula 


2   50         _  _ 

£    E   (x.,  ,  -  x.n)(x,,  ,  -  x,,) 


Nx  +  N2  -  2  ^  k:i  ^ikl   ~il/v~jkl   ~jl' 

where  i  =  1,2,. ..,10  and  j  =  1,2,. ..,10,  and  N  and  N2 
are  the  number  of  observations  of  the  respective  groups. 
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TABLE  VI  -  Control  Group  Dispersion  Matrix 


Variable 

1 

2 

3 

h 

5 

1 

ifI.M-7 

7700. ifl 

-25.90 

0.71 

0.58 

2 

7700.^1 

23251+33.00 

827.55 

-608.69 

95".87 

I 

-25.90 

827.55 

503.67 

-52.11 

h.P>7 

0.71 

-608.69 

-52.11    • 

15.10 

-0.95 

Variable      5 

0.53 

95.37 

^.87 

-0.95 

0.33 

6 

1.21 

2V9. 13 

12.76 

-2.00 

0.61 

7 

0.05 

-9*39 

3.^8 

-0.8*f 

0.11+ 

8 

0.29 

2M-.17 

5.7h 

-1.30 

O.23 

9 

0.78 

538.38 

l^-.62 

-1.76 

0.1+8 

10 

0.22 

235.05 

6.0>+ 
Variable 

-O.83 

0.21 

6 

7 

8 

9 

10 

1 

1.21 

0.05 

0.29 

0.78 

0.22 

2 

2^-9.13 

-9.39 

2tf.l7 

538.38 

235.05 

2 

12.76 

3^8 

5*7h 

1^.62 

6.01+ 

-2.00 

-0.8H- 

-1.30 

-1.76 

—0«.  03 

Variable      5 

0.61 

O.ltf 

0.23 

0A8 

0.21 

6 

1.26 

0.30 

0.52 

0.77 

0.32 

7 

0.30 

0.1*1- 

0.21 

0.27 

0.07 

8 

0.52 

0.21 

0.35 

0.3^ 

0.09 

9 

0.77 

0.27 

0.3H 

3.10 

1.11 

10 

0.32 

0.07 

0.09 

1.11 

0.57 
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TABLE  VII  -  Accident  Group  Dispersion  Matrix 


Variable 


Variable 


1 
2 

I 

5 
6 

7 

8 

9 
10 


32.62         6776.32 

6776.32  1619^79.00 
-31.67      -2719.6V 


1A3 
-0.1+5 
-1.V6 
-O.32 
-1.33 
-6.5? 
-1.V2 


V20.05 

-9.87 

-322.23 

-66.75 

-3V6.63 

-I30V.87 

-269.9V 


1 

-1.V6 

-0.32 

2 

-322.23 

-66.75 

I 

n     1  < 

-1.97 

-0.93 

5 

3.82 

1.17 

6 

5.23 

1.35 

7 

1.35 

0.79 

8 

2.2V 

0.97 

9 

-0.85 

-O.96 

10 

-0.10 

-O.VV 

Variable 

3 

-  V 

5 

-31.67 

1A3 

-0.V5 

-2719.6V 

V20.05 

-9.87 

1119.93 

9.76 

7.69 

9.76 

3.18 

-1.23 

7.69 

-I.23 

3-81 

11.15 

-1.97 

3.82 

0.93 

-0.93 

1.17 

6.V7 

-I.V3 

1.29 

86.96 

0.60 

-0.37 

¥+.58 

0.86 

O.29 

Variable 

8 

9 

10 

-1*33 

-6.55 

-1.1+2 

-3V6.63-i30V.87 

-269.QV 

6.V7 

86.96 

i+V.53 

-I.V3 

0.60 

0.86 

I.29 

-0.37 

0.29 

2.2V 

-0.85 

-0.10 

0.97 

-0.96 

-O.VV 

1.81 

-0.13 

-0.19 

-0.13 

50.90 

19.0V 

-0.19 

19.0V 

11.68 
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TABLE  VIII   -  Pooled  Within   Groups   Dispersion  Matrix 


Variable 


Variable 


1 
2 

I 

5 
6 

7 
8 

9 
10 


37.0? 
7233.36 

-23.78 
1.07 
0.06 

-0*13 
-0,13 
-0.^2 
-2.88 
-0.60 


7238.36 

1972^56.00 

-9h6.Qh 

-9^.32 

^3.00 

-36.55 

-38.07 

-161.23 

-383 *2h 

-17. M+ 


-28.78 

-9^.0^ 

811.80 

-21.18 

6.28 

11.96 

2.20 

6.11 

50.79 

25-31 


1.07 
-9^.32 
•21.18 

9.1^ 
-1.09 
-1.98 
-0.89 
-1.37 
-0.53 

0.01 


0.06 
^3.00 
6.28 
-1.09 
2.07 
2.21 
0.66 
0.76 

0.05 
0.25 


Variable 


7 


8 


1 

-0.13 

2 

-36.55 

I 

11.96 

-1.98 

Variable     5 

2.21 

6 

3.2*4- 

7 

0.83 

8 

1.38 

9 

-0.0*f 

10 

0.11 

-0.13 

-O.52 

38.07 

-161.23 

2.20 

6.11 

-0.89 

-1.37 

0.66 

0.76 

0.83 

1.38 

oM> 

0.59 

0.59 

1.08 

-0.35 

0.11 

-0.18 

-0.05 

-2.88 
•383. 2h 
50.79 
-0.53 
0.0£ 
-0.04- 

-0.35 

0.11 

27.00 

10.07 


10 

-0.60 

•17.^ 

25.31 

0.25 

0.11 
-0.18 
-0.05 
10.07 

6.13 
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APPENDIX  D 
STATISTICAL  TEST  FOR  EQUALITY  OF  DISPERSION  MATRICES 

Given  a  sample  of  two  groups  and  m  variables  with  group 
dispersion  matrices  S  and  S„ ,  pooled  within  groups  disper- 
sion matrix  S„,  and  total  sample  observations  N  =  N,  +  N~ , 
the  hypothesis  that  the  dispersion  matrices  (and  thus  the 

covariance  matrices)  are  statistically  equal  may  be  deter- 

22 
mined  by  the  following  computations : 

A  =  ln[|Sw|]  •  (N-2)  -  (N1-l).ln[|S1|]  -  (N2-l) • ln[ | S2 | ] 


B  =  — 


6(2  -  l)(m  +  1) 


[ ^— p  +    1   p i-p]  •  (m  -  l)(m  +  2) 

(N.,+ir    (Np+IT    (N-2T 
C  =  ± ± 


D  _  m(m  +  1) 


E  =  D  +  2 


abs  IB2  -  C 


22Box,  G.E.P.,  "A  General  Distribution  Theory  for  a  Class 
of  Likelihood  Criteria,"  Biometrika  36  (19^9),  p.  317-3^6 
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If  B  is  greater  than  C,  the  test  statistic  is: 


,En  r   A(l  -  B  +  2/E)   -,    „D 
^D;  LE  -  A(l  -  B  -  2/E)J    rE 


p 

If  C  is  greater  than  B  then  the  test  statistic  is 


(-)  •  (1   -   B  -  -)  ~  FD 
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APPENDIX  E 

Figures  (10)  through  (19)  depict  cluster  analysis  plots 
for  each  of  the  ten  variables  being  studied.   The  label  "P" 
on  the  vertical  axis  of  each  figure  represents  the  proportion 
of  each  cluster  that  originates  from  the  accident  group. 
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APPENDIX  P 
CORRELATION  MATRIX 


The  product-moment  correlation  between  two  variables, 

Cov(X1,XJ) 


[j    and  X . ,    is   given  by      p 


U 


Var(X. )Var(X 


An  element 


of  the  correlation  matrix  can  be  calculated  by  the  equation 


ij 


Ji  (x^-x^cx^-x.) 
[  z  (xik-x.)2  e  (xJk-x.)2  ] 


1/2 


TABLE  VIII 

—  Correlation  Matrix  for  all  Data 

Variable 

1 

2 

3 

h 

5 

1 

1.0000 

0.8494 

-0.2080 

0.1161 

-0.0378 

o 

0    Rl+fiU. 

1.0000 

_0     <^0<T 

0.0539 

-0.0288 

I 

-0,2080 

-0.0961 

1.0000 

-0.4616 

0.3111 

0.1161 

0.0539 

-0.4616 

1.0000 

-O.38I8 

Variable     5 

-0.0378 

-0.0288 

0.3111 

-0.3818 

1.0000 

6 

-0.0497 

-0.0558 

0.3516 

-0.4539 

0.8682 

7 

-0.0710 

-0.0817 

0.2644 

-0.5147 

0.7092 

8 

-0.1163 

-0.1464- 

0.3308 

-0.5114 

0.5596 

9 

-0.1436 

-0.1164 

0.5281+ 

-0.2957 

0.1851 

10 

-0.0852 

-0.0577 

0.4864 

-0.1973 

0.1932 

6 

7 

8 

9  . 

10 

1 

-0.0497 

-0.0710 

-0.1163 

-0.1436 

-0.0852 

2 

-0.0558 

-O.0817 

-0.1464^ 

-0,1164 

-0.0577 

I 

0.3 516 
-0.4539 

0.2644 

0.3308 

0.5284 

0.4864 

-0.5147 

-0.5114 

-0.2957 

-0.1973 

Variable      5 

0.8682 

0.7092 

0.5596 

0.1851 

0.1932 

6 

1.0000 

0.7058 

0.7591 

0.1520 

0.1370 

7 

0.7058 

1.0000 

0.8499 

O.O836 

0.0251 

8 

0.7591 

0.8499 

1.0000 

0.1726 

0.0994 

9 

0.1520 

0.0836 

0.1726 

1.0000 

0.8169 

10 

0.1370 

0.0251 

0.0994- 

0.8169 

1.0000 
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APPENDIX  G 
TABLE  IX  -  Principal  Components  for  Control  Group* 


Variables 

Variables 



Variables 

Variables 

Obs. 

1    .r:   2 

5,6,7,&  8 

Obs. 

1  &  2 

5,6, 7, &  8 

•i 

1.6699 

-1.3798 

26 

0.5332 

-1.9888 

2 

-1.379V 

6. 3 810 

27 

0.1328 

-0.%91 

s 

-0.1867 

-O.1V88 

28 

0.862V 

1.V137 

-1.765° 

-1.119V 

29 

O.23VI 

-0.6^11 

? 

_t  .98V5 

1.9^68 

30 

0.92"5-! 

-^.o^Vl 

o 

-x.B^e 

-1.8615 

31 

1.1103 

-1.2V11 

7 

0.5M 

-O.V092 

32 

1.5795 

O.238I 

8 

0.2131 

-0.6300 
0.2%5 

11 

0.9230 

0.3750 

9 

0.6 90V 

0.2063 

-O.883V 

10 

1.17V8 

-1.372V 

35 

■   -1.6680 

1.3195 

11 

1.5850 

0.32.V8 

36 

-0.7722 

-1.7726 

12 

0.8035 

-1.8672 

37 

-I.19HV 

-I.3782 

1? 

l.lVl? 

0.8991 

38 

-I.03VI 

-2.6922 

l^f 

0.7133 

V.5973 

39 

0.2U69 

I.7338-. 

15 

1.5827 

0.9013 

1+0 

-O.9079 

-0.19V1 

16 

1.6013 

1.5121 

Vl 

-0.0550 

3.973V 

17 

1.6V67 

O.V627 

V2 

0.7382 

-2.9538 

18 

1.V937 

O.9V33 

j? 

0.9838 

-2.51^ 

19 

-2.0501 

O.O80V 

Lif 

1.1878 

-1.5082 

20 

-I.722V 

-O.2V06 

V5 

1.1599 

-0.6775 

21 

-1.3396 

-2.V2.69 

V6 

0.6936 

0.2885 

22 

-O.8V3V 

-1.2893 

h7 

-1.7383 

1.9168 

23 

-3.60V6 

0.711^ 

V8 

-0.158V 

0.55V5 

2V 

.^5.^636 

-0.2801 

1+9 

-2.1592 

1.523V 

25    .. 

I.V983 

1.3292 

50 

-2.3989 

-0.1913 

*Note':   The  principal  components  were  extracted  from 
standardized  data. 
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TABLE  X  -  Principal  Components  for  Accident  Group* 


Obs, 


Variables 
1  &  2 


Variables 
5,6, 7,&  8 


Obs, 


Variables 
1  &  2 


Variables 
5,6, 7,&  8 


1 
2 


5 

6 

7 
8 

9 

10 

11 

12 

13 
ih 

15 
16 

17 
18 

19 
20 
21 
22 

2? 
2h 

25 


1.3880 
•0*2^61 

0.1399 
0.8553 
1.3791 
1.5297 

-1.3378 
0.7^25 

-2.925+2 

-0.6633 
0.1599 
0.755+*+ 
0.0578 
1.3880 

-1.5289 

1.102** 

0.7309 
1.5+15+7 

0.7625 

1.1120 

-2 .  7lV7 

0 16060 

-0.855+2 
0. 725+7 
0.0150 


-0.9997 
-0.5+5+06 
-0.9650 
-0.0562 

2.37^3 

-1.0603 

1.8637 

-1.8509 

0.0391 

-2.3258 

2A339 
0.7056 
1.9759 

0.025+2 

0.85+63 

2.5191 

-2.3258 

0.3751 
-0.5+883 
-1.7367 
-1.0970 

3.0390 

1.5+120 

2.0216 

-1.5083 


26 

27 

28 

29 

30 

31 
32 

3? 

35+ 

25 
36 
37 
38 
39 

5+0 
5+1 
5+2 

5+3 
5+5+ 

5+5 
5+6 

h7 

1+8 

5+9 

50 


1-3591 

0.5521 

0.7381 

-1.935+0 

-2.7369 

0.2065 

-1.57^0 

-2.9702 

0.2316 

l.55'+o 

1 .05+5+8 

0.6366 

-1.6751 

-1.85+5+6 

0.8336 
0.5915+ 
0.3082 

-3.1236 

0.285+5+ 

1.165+7 
0.29-32 

-2.125+6 
1.3280 
0.7825 

1.5+791 


1.6810 
.  0.0758 
-2.3258 
-2.^258 
-2.3258 

0.905+2 

-O.775+3 

-2.3258 

3.9012 

-0.9173 

-2.3258 

1.5398 

0.1790 

3. 6157 

-2.3258 

-0.9173 
1.5398 
-O.5S36 
-O.2023 
-1.35+85 
2.65+27 

-1.1079 

-2.3258 

O.5+909 

1.3871 


*Note :   The  principal  components  were  extracted  from 
standardized  data. 


n 


APPENDIX  H 
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